NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part1 / 3080 < prev next >

Wrap

Internet Message Format | 1996-08-05 | 2.3 KB

Path: news.infi.net!usenet From: nngis@norfolk.infi.net (Greg DiGiorgio) Newsgroups: comp.lang.c Subject: Re: Sorting large files Date: 25 Jan 1996 20:44:24 GMT Organization: Customer of InfiNet Message-ID: <4e8q38$4va@nw002.infi.net> References: <4e8j9b$cuf@longwood.cs.ucf.edu> Reply-To: nngis@norfolk.infi.net NNTP-Posting-Host: h-standbyme.norfolk.infi.net Mime-Version: 1.0 X-Newsreader: WinVN 0.99.3 In article <4e8j9b$cuf@longwood.cs.ucf.edu>, schnitzi@longwood.cs.ucf.edu says... > >There was some discussion here a little while >back on how to sort the lines in a large file >without having to have a huge character array. >I suggested using ftell and fseek to hunt down >the particular lines you are comparing. Only >just the other day did I notice (via a web >search engine) that someone posted a question >on how this could be done... So here goes. > >... snipped ... just throwing my 2 cents (perhaps 1.2 cents) in... Sorting large files is a problem, especially if you try to do them in memory. On mainframes, one buys an entire pkg devoted to sorting. In UNIX, you use the "sort" cmd. RDBMS have to make use of optimized sorts to provide fast reponse. Either way, I can not envision sorting large files without temp work files to hold intermediate results. Let's assume you have 1 million records of 100 bytes to sort - that's 100M of data. Based on the number of records to sort, you could divide the input file, say, into sets of 10,000 records. Sort each set into a work file. After accumulating a number of sets, perform a merge-sort on those work files into a single new work file. Del the other work files. Move on to the next set of work files, merging them into another work file. Merge the 2 sorted work files and del the merged files. Do this until you have completely sorted the data file. The memory-based sort you use is your choice. Mainframe pkgs sample data before sorting to choose the most optimized sorting method to use. I mean, you have bubble sort and insertion sort on the low end and quick sort & heap sort on the high end and probably 50 other sorting algorithms I don't even know about. Sorting data is such a technical issue, that mainframe RDBMS are even resorting to sort progs implemented in hardware to speed this often-used bottleneck. Good luck on your course of action, Greg DiGiorgio